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Abstract 

In addition to the equations, physicists use the following additional 
difficult-to-formalize property: that the initial conditions and the value 
of the parameters must not be abnormal. We will describe a natural 
formalization of this property, and show that this formalization in good 
accordance with theoretical physics. At present, this formalization has 
been mainly applied to the foundations of physics. However, potentially, 
more practical applications are possible. 

1 Main Idea: In Short 

Traditional mathematical approach to the analysis of physical systems implic- 
itly assumed that all mathematically possible integers are physically possible as 
well, and all mathematically possible trajectories are physically possible. Tra- 
ditionally, this approach has worked well in physics, but it does not lead to a 
very good understanding of chaotic systems, which, as is now known, are ex- 
tremely important in the study of real-world phenomena ranging from weather 
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to biological systems. 

Kolmogorov was among the first who started, in the 1960s, analyzing the dis- 
crepancy between the physical and the mathematical possibility. He pinpointed 
two main reasons why a mathematical correct solution to the corresponding 
system of differential or difference equation can be not physically possible: 

• First, there is a difference in understanding the term "random" in math- 
ematics and in physics. For example, in statistical physics, it is possible 
(probability is positive) that a kettle, when placed on a cold stove, will 
start boiling by itself. From the viewpoint of a working physicist, how- 
ever, this is absolutely impossible. Similarly, a trajectory which requires 
a highly unprobable combination of initial conditions may be mathemat- 
ically correct, but, from the physical viewpoint, it is impossible. 

• Second, the traditional mathematical analysis tacitly assumes that all inte- 
gers and all real numbers, no matter how large or how small, are physically 
possible. From the physical viewpoint, however, a number like 10 10 is 
not physically possible at all, because it exceeds the number of particles 
in the Universe. In particular, solutions to the corresponding systems of 
differential equations which lead to some numbers may be mathematically 
correct, but they are physically meaningless. 

Attempts to formalizing these restrictions have been started by Kolmogorov 
himself. These attempts are at present, mainly undertaken by researchers in 
theoretical computer science who face a similar problem of distinguishing be- 
tween theoretically possible "algorithms" and feasible practical algorithms which 
can provide the results of their computations in reasonable time. 

The goal of the present research is to use the experience of formalizing these 
notions in theoretical computer science to enhance the formalization of similar 
constraints in working physics. 

This research is mainly concentrated around the notion of Kolmogorov com- 
plexity. This notion was introduced independently by several people: Kol- 
mogorov in Russia and Solomonoff and Chaitin in the US. Kolmogorov used 
it to formalize the notion of a random sequence. Probability theory describes 
most of the physicist intuition in precise mathematical terms, but it does not 
allow us to tell whether a given finite sequence of O's and l's is random or not. 
Kolmogorov defined a complexity K(x) of a binary sequence x as the shortest 
length of a program which produces this sequence. Thus, a sequence consist- 
ing of all O's or a sequence 010101. . . have a very short Kolmogorov complexity 
because these sequences can be generated by simple programs, while for a se- 
quence of results of tossing a coin, probably the shortest program is to write 
print(0101. . . ) and then reproduce the entire sequence. Thus, when K(x) is ap- 
proximately equal to the length len(x) of a sequence, this sequence is random, 
otherwise it is not. The best source for Kolmogorov complexity is a book [p0| . 
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The definition of K(x) only takes into consideration the length len(p) of a 
program p. From the physical viewpoint, it is also important to take into consid- 
eration its running time t(p), because if it exceeds the lifetime of the Universe, 
this algorithm makes no practical sense. This development is in line with Kol- 
mogorov's original idea that some natural numbers which are mathematically 
possible (like 10 10 ) are not feasible and thus, should not considered as feasible. 
Corresponing modifications are also described in the above book. We plan to 
use the corresponding ideas in physics. 



2 Main Idea: Brief Philosophical Analysis 

One of the main objectives of science is to provide guaranteed estimates for 
physical quantities. In order to find out how estimates can be guaranteed, let 
us recall how quantities are estimated in physics: 

• First, we must find a physical law that describes the phenomena that we 
are analyzing. For some phenomena, we already know the corresponding 
laws: we know Maxwell's equation for electrodynamics, Einstein's equa- 
tion for gravity, Schroedinger's equations for quantum mechanics, etc. 
(these laws can be usually deduced from symmetry conditions [ff3| , p6| , Eqj). 
However, in many other cases, we must determine the equations from the 
general theoretical ideas and from the experimental data. Can we guar- 
antee that these equations are correct? If yes, how? 

There is an extra problem here. In some case, we know the equa- 
tions, but we are not sure about the values of the parameters of these 
equations. If the theory predicts, e.g., that a dimensionless param- 
eter is 1, and the experiments confirm it with an accuracy of 0.001, 
should we then use exactly 1 or 1 ± 0.001 for a guaranteed estimate? 
If the accuracy is good enough, then the physicists usually use 1. We 
may want to use 1 ± 0.001 to be on the safe side, but then, for other 
parameters of a more general theory (that in this particular theory 
are equal to 0) should we also use their experimental bounds instead 
of the exact value? There are often many possible generalizations, 
and if we take all of them into consideration, we may end up with 
a very wide interval. This is a particular case of the same problem: 
when (and how) can we guarantee that these are the right equations, 
with the correct values of the parameters? 

• Suppose now that we know the correct equations. Then, we need to de- 
scribe how we will actually predict the value of the desired quantity. For 
example, we can get partial differential equations that describe how ex- 
actly the initial values cf>(x,to) of all the fields change in time. Then, to 
predict the values of the physical quantity at a later moment of time t, we 
must do the following: 
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Determine the values </>(x,to) from the measurement results. 
Use these values (/>(x,to) to predict the desired value. 



The problem with this idea is that reconstructing the actual values cf>(x, to) 
from the results of measurements and observations is an ill-posed problem 
H |§ |3 H, ||, in the sense that two essentially different func- 



tions <f>(x,to) are consistent with the same observations. For example, 
since all the measurement devices are inertial and thus suppress the high 
frequencies, the functions </>(x, to) and (f>(x,to) + A ■ sin(u>x), where u> is 
sufficiently big, lead to almost similar values of observations. 

A typical example of an ill-posed problem is a problem of recon- 
structing the actual brightness distribution of a celestial object from 
its observed image p7[ . 

Thus, strictly speaking, if we do not have any additional restrictions on 
4>(x,to), then for every x, the set of possible values of <p(x, to) is the en- 
tire real line. So, to get a guaranteed interval for <^>(x, to) (and hence, for 
the desired physical quantity), we need to use some additional informa- 
tion. The process of using this additional information to get non-trivial 
estimates for the solution of the inverse problem is called a regularization 

H f(||o|, |3 H 11- There 

are several situations where this additional 



information is available: 



• If we are analyzing familiar processes, then we usually know (more 
or less) how the desired function (f>(x,to) looks like. For example, 
we may know that 4>(x,t ) is a linear function C\ + C<z ■ X\, or a 
sine function C\ ■ sin(C2Xi + C3), etc. In mathematical terms, we 
know that <p(x, to) = /(x, C%, . . . , Ck), where / is a known expres- 
sion, and the only problem is to determine the coefficients Ci. This 
is how, for example, the orbits of planets, satellites, comets, etc., are 
computed: the general shape of an orbit is known from Newton's 
theory, so we only have to estimate the parameters of a specific or- 
bit. In such cases, the existence of several other functions </>(x,to) 
that are consistent with the same observations, is not a big prob- 
lem, because we choose only the functions x(t) that are expressed by 
the formula f(t, C\, . . . , Ck). This is not, however, a frequent situ- 
ation in physics, because one of the main objectives (and the main 
challenges) of physics is to analyze new phenomena, new effects, qual- 
itatively new processes, and in these cases no prior expression / is 
known. 

• In some cases, we know the statistical characteristics of the recon- 
structed quantity </>(x,to) and statistical characteristics of the mea- 
surement errors. In these cases, we can formulate the problem of 
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choosing the maximally probable <f>(x,to), and end up with one of 
the methods of statistical regularization, or filtering (Wiener filter is 
one of the examples of this approach). 

If we do not have this statistical information, but we know, e.g., that 
the average rate of change of x(t) is smaller than some constant A 

(i.e., \J J x(t) 2 alt < A), then we can apply regularization methods 
proposed by A. N. Tikhonov and others Q |U[ [||. 

In many cases, we do not have the desired statistical information. 
However, we may have some expert knowledge. For example, if we 
want to know how the temperature on a planet changes with time 
t, then the experts can tell that most likely, x(t) is limited by some 
value M, and that the rate x(t) with which the temperature changes, 
is typically (or "most likely," , etc) limited by some value A, etc. We 
can also have some expert knowledge about the error, with which we 
perform our measurements, so the resulting expert's knowledge about 
the value of measured quantity y looks like "the difference between 
the measured value y and the actual value y is most likely, not bigger 
than 6" (where 6 is a positive real number given by an expert) . The 
importance of this information is stressed in Q and in Chapter 5 of 
flg}| . The methods of using this information and their application to 
testing airplane and spaceship engines is described in |^7|, fl4|, |||, Q . 

In many case, we do not have any quantitative expert information like 
the one we described. In these cases, it is usually recommended to use 
some heuristic (or semi-heuristic) regularization techniques |p3, [36], 



p0| , |37| , |38| |58|, |I7|. These methods often lead to reasonable results, 
but they do not give any guaranteed estimate for the reconstructed 
value 4>{x. to). 

• Suppose that we have the equations, and that we have chosen an appro- 
priate regularization for these equations. Then, in principle, we can have 
the guaranteed estimate. The problem is that the numerical methods that 
the physicists currently use do not give us these guaranteed estimates. For 
example, we may have an iterative procedure for solving the equation, and 
in this procedure, we stop if the next iteration is close to the previous one. 
The fact that iterations are close may mean that we are close to the actual 
solution, but how close are we? In other words, how to get a guaranteed 
estimate for the solution that is obtained by a heuristic method? For some 



equations, such methods are known |18|, |19|, |20|, but these methods are far 
from being general. 

There are several successful applications of interval methods (i.e., computational 
methods which provide guaranteed estimates) to physics: 
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Stability of solar system and likewise systems: there exists a K.A.M. 
method (Kolmogorov-Arnold-Moser) that proves that for sufficiently small 
values of some parameter, the Solar system is stable. The upper bound 
is much lower than the actual value Q. In [j77l |6l], [32], |63|, 65 , interval 



computations are used to find upper bounds for stability that for some 
systems, cover up to 90% of the actual stability zone. 

• Relativistic stability of matter p3| , j&jj| : for relativistic version of 
Schroedinger equation, it is proved that for sufficiently small charges, the 
energy spectrum is non- negative (i.e., for N — ► oo, the system does not 
collapse). The estimate for the charge is close to the one for which the 
collapse actually occurs. 

• Asymptotic energy of atoms |]80| , [si] , [l5|| (based on Th. Fermi equation). 

All these applications and the corresponding methods are domain- specific. What 
can we do to get guaranteed estimates in the general case? 
There are two possible approaches to this problem: 

• A pessimistic approach: that we will never be able to get guaranteed es- 
timates. This approach is typical in statistics. For example, a well-known 
statistician R. A. Fisher says that a "hypothesis is never proved or estab- 
lished, but is possibly disproved, in the course of experimentation" (p9t, 
p. 16; for a modern description of this viewpoint, see, e.g., Q). Strictly 
speaking, from this viewpoint, we cannot even say that a theory is dis- 
proved with a guarantee. Indeed, if, e.g., a theory predicts 1, and the 
measurement has led to 2, then, no matter how small the standard devia- 
tion of the measurement error can be, the probability that the difference 
is caused by the measurement error is non-zero, and so, it is possible that 
the theory is still correct. 

• An optimistic approach, that most physicists hold, is that we can make 
guaranteed conclusions from the experiments. A disproved theory is 
wrong, and the chance that the measurement error has caused it is as 
large as having the cards in order after thorough shuffling, or a possibility 
to win the lottery every time by guessing the outcome: it is impossible. 

In this paper, we will describe a formalization of the optimistic approach. 



Main Idea in Detail: The Notion of "Not Ab- 
normal" and How To Formalize It 
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3.1 Physicists Assume That Initial Conditions And Values 
Of Parameters Are Not Abnormal 

To a mathematician, the main contents of a physical theory is the equations. 
The fact that the theory is formulated in terms of well-defined mathematical 
equations means that the actual field must satisfy these equations. However, 
this fact does not mean that every solution of these equations has a physical 
sense. Let us give two examples: 

• At any temperature greater than absolute zero, particles arc randomly 
moving. It is theoretically possible that all the particles start moving in 
one direction, and, as a result, the chair that I am sitting on starts lifting 
up into the air. The probability of this event is small (but positive), so, 
from the purely mathematical viewpoint, we can say that this event is 
possible but highly unprobable. However, the physicists say plainly that 
such an abnormal event is impossible (see, e.g., p4[). 

• Another example from statistical physics: Suppose that we have a two- 
chamber camera. The left chamber if empty, the right one has gas in it. 
If we open the door between the chambers, then the gas would spread 
evenly between the two chambers. It is theoretically possible (under ap- 
propriately chosen initial conditions) that the gas that was initially evenly 
distributed would concentrate in one camera, but physicists believe this 
abnormal event to be impossible. This is a general example of what physi- 
cists call irreversible processes: on the atomic level, all equations are in- 
variant with respect to changing the order of time flow t — » —t). So, if 
we have a process that goes from state A to state B, then, if at B, we 
revert all the velocities of all the atoms, we will get a process that goes 
from B to A. However, in real life, many processes are clearly irreversible: 
an explosion can shatter a statue, but it is hard to imagine an inverse 
process: an implosion that glues together shattered pieces into a statue. 
Boltzmann himself, the 19 century author of statistical physics, explicitly 
stated that such inverse processes "may be regarded as impossible, even 
though from the viewpoint of probability theory that outcome is only ex- 
tremely improbable, not impossible." || (for this similar citations from 
other founding fathers of statistical physics, see 

• If wc flip a coin 100 times in a row, and get heads all the time, then a person 
who is knowledgeable in probability would say that it is possible, while a 
physicist (and any person who uses common sense reasoning) would say 
that the coin is not fair, because it is was a fair coin, then this abnormal 
event would be impossible. To illustrate this point, G. Polya in |76| cites 
the following anecdote from the treatise of J. Bertrand on probabilities: 

One day in Naples the reverend Galiani saw a man from the Basilicatc 
who, shaking three dice in a cup, wagered to three sixes; and, in fact, 
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he got three sixes right away. Such luck is possible, you say. Yet 
the man succeeded a second time, and the bet was repeated. He put 
back the dice in the cup, three, four, five times, and each time he 
produced three sixes. "Sangue di Bacco", exclaimed the reverend, 
"the dice are loaded!" And they were. 

• In all the above cases, we knew something about probability. However, 
there are examples of this type of reasoning in which probability does not 
enter into picture at all. For example, in general relativity, it is known that 
for almost all initial conditions (in some reasonable sense) the solution has 
a singularity point. Form this, physicists conclude that the solution that 
corresponds to the geometry of the actual world has a singularity (see, e.g., 



67 ): the reason is that the initial conditions that lead to a non-singularity 
solution are abnormal (atypical), and the actual initial conditions must be 
not abnormal. 

In all these cases, the physicists (implicitly or explicitly) require that the actual 
values of the fields must not satisfy the equations, but they must also satisfy 
the additional condition: that the initial conditions should not be abnormal. 



3.2 The Notion of "Not Abnormal" Is Difficult to Formal- 
ize 

At first glance, it looks like in the probabilistic case, this property has a natural 
formalization: if a probability of an event is small enough (say, < po for some 
very small po), then this event cannot happen. For example, the probability that 
a fair coin falls heads 100 times in a row is 2 -100 , so, if we choose po > 2~ 100 , 
then we will be able to conclude that such an event is impossible. The problem 
with this approach is that every sequence of heads an details has exactly the 
same probability. So, if we choose po > 2 -100 , we will thus exclude all possible 
sequences of heads and tails as physically impossible. However, anyone can toss 
a coin 100 times, and this prove that some sequences are physically possible. 

Historical comment. This problem was first noticed by Kyburg under the name 
of Lottery paradox Q: in a big (e.g., state- wide) lottery, the probability of 
winning the Grand Prize is so small, then a reasonable person should not expect 
it. However, some people do win big prizes (for a recent discussion of this 



paradox, see, e.g., [14 21, [75|) 



3.3 How to Formalize The Notion of "Not Abnormal": 
Idea 

"Abnormal" means something unusual, rarely happening: if something is rare 
enough, it is not typical ("abnormal"). Let us describe what, e.g., an abnormal 
height may mean. If a person's height is > 6 ft, it is still normal (although it 
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may be considered abnormal in some parts of the world). Now, if instead of 6 
pt, we consider 6 ft 1 in, 6 ft 2 in, etc, then sooner or later we will end up with 
a height h such that everyone who is higher than h will be definitely called a 
person of abnormal height. We may not be sure what exactly value h experts 
will call "abnormal" , but we are sure that such a value exists. 

Let us express this idea is general terms. We have a Universe of discourse, 
i.e., a set U of all objects that we will consider. Some of the elements of the 
set U arc abnormal (in some sense), and some are not. Let us denote the set 
of all elements that are typical (not abnormal) by T. On this set, we have a 
decreasing sequence of sets A\ D A2 2 . . . 3 A n D . . . with the property that 
P\A n = 0. In the above example, U is the set of all people, A\ is the set of all 
people whose height is > 6 ft, A2 is the set of all people whose height is > 6 
ft 1 in, A2 is the set of all people whose height is > 6 ft 2 in, etc. We know 
that if we take a sufficiently large n, then all elements of A n are abnormal (i.e., 
none of them belongs to the set T of not abnormal elements). In mathematical 
terms, this means that for some n, we have A n n T = 0. 

In case of a coin: U is the set of all infinite sequences of results of flipping a 
coin; A n is the set of all sequences that start with n heads but have some tail 
afterwards. Here, L)A n = 0. Therefore, we can conclude that there exists an n 
for which all elements of A n are abnormal. So, if we assume that in our world, 
only not abnormal initial conditions can happen, we can conclude that for some 
n, the actual sequence of results of flipping a coin cannot belong to A n . The 
set A n consists of all elements that start with n heads and a tail after that. So, 
the fact that the actual sequence does not belong to A n means that if an actual 
sequence has n heads, then it will consist of all heads. In plain words, if we have 
flipped a coin n times, and the results are n heads, then this coin is biased: it 
will always fall on heads. 

Let us describe this idea in mathematical terms. 



3.4 Formal Definition |27] 

To make formal definitions, we must fix a formal theory: e.g., the set theory ZF 
(the definitions and results will not depend on what exactly theory we choose) . 



Definition 1. We say that a set S is definable if in ZF, there exists a formula 
P(x) with one free variable x such that P(x) iff x S S. 

Comment. Crudely speaking, a set is definable if we can define it in ZF. The set 
of all real numbers, the set of all solutions of a well-defined equations, every set 
that we can describe in mathematical terms is definable. This does not means, 
however, that every set is definable: indeed, every definable set is uniquely 
determined by formula P(x), i.e., by a text in the language of set theory. There 
are only denumerably many words and therefore, there are only denumerably 
many definable sets. Since, e.g., there are more than denumerably many set of 
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integers, some of them are this not definable. 

Definition 2. We say that a sequence of sets A\, . . . , A n , . . . is definable if in 
ZF, there exists a formula P(n, x) such that x G A n iff P(n, x). 
Definition 3. 

• Let a set U be given. We will call it a universal set. 

• A non-empty set T C U is called a set of typical ( not abnormal) elements if 
for every definable sequence of sets A n for which A n D A n+ \ and C\A n = 0, 
there exists an N for which An H T = 0. 

• If u G T , we will say that u is not abnormal. 

• For every property P, we say that "normally, for all u, P(u) " if P(u) is 
true for all u e T. 

3.5 Existence Theorems 

The trivial existence result is: for every set U, there is a set of typical elements T 
that satisfies Definition 3: indeed, we can take a onc-clcmcnt set T = {u} E U. 

A more interesting existence result appears if we take into consideration the 
fact that our definition did not completely capture the following property of 
the notion of "abnormal": that exceptions (i.e., abnormal elements) should be 
rare. "Rare" usually means that the probability of an element being abnormal 
should be small enough (i.e., < e for some given e > 0). We may not know 
the exact probabilities, so we may want to choose the set T in such a way 
that exceptions will be rare no matter what probability measure we choose. To 
describe this situation, we thus need to fix a real number e > 0, and a finite 
sequence of probability measures p\,...,p n . The only problem with this idea is 
that definable sets may be not measurable. Therefore, in order to apply it, we 
will modify Definition 3 so that it will allow only sequence A n whose elements 
are measurable w.r.t. given measures. 

Definition 3'. 

• Let a set U be given. We will call it a universal set. Let p\, . . . ,p m be 
probability measures on U. 

• A non-empty set T C U is called a set of (pi, . . . ,p, n ) — typical elements if 
for every definable sequence of sets A n for which A n D A n+ \, P\A n = 0, 
and all elements A n are measurable w.r.t. each measure Pi, there exists 
an N for which A N n T = 0. 

PROPOSITION 1. Assume that we have a set U , m probability measures 
Pi,...,p m on U, and a real number e > 0. Then, there exists a set T of 
(pi, . . . ,p m ) — typical elements for which Pi{T) > 1 — e for all i. 
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Comment. For reader's convenience, all the proofs are given in the special 
section at the end of the paper. 

In other words, it is possible to define abnormal elements in such a way that 
for each of m measures, the probability of an element to be abnormal is < e. 

Now, that we have the definition, let us show that this notion can indeed 
help to give guaranteed estimates. 

4 Based On Finitely Many Experiments, We 
Can Guarantee That The Theory Is Correct 

4.1 General Result 

Let us show first that if we assume that the results of experiments are required 
not to be abnormal, then we can (potentially) guarantee that the theory is 
correct after only finitely many experiments. 

From the viewpoint of an experimenter, a physical theory can be viewed 
as a statement about the results of physical experiments. If we had an infi- 
nite sequence of experimental results ri, . . . , r n , . . ., then we will be able to tell 
whether the theory is correct or not. So, a theory can be defined as a set of 
sequences n, . . . ,r n , . . . that are consistent with its equations, inequalities, etc. 
In real life, we only have finitely many results n, . . . ,r n , so, we can only tell 
whether the theory is consistent with these results or not, i.e., whether there is 
an infinite sequence n, . . . ,r n , . . . that starts with the given results that satisfies 
the theory. 

It is natural to require that the theory be physically meaningful in the fol- 
lowing sense: if all experiments confirm the theory, then this theory should be 
correct. An example of a theory that is not physically meaningful is easy to give: 
assume that a theory describes the results of tossing a coin, and it predicts that 
at least once, there should be a tail. In other words, this theory consists of all 
sequences that contain at least one tail. Let us assume that actually, the coin 
is so biased that we always have heads. Then, this infinite sequence does not 
satisfy the given theory. However, for every n, the sequence of the first n results 
(i.e., the sequence of n heads) is perfectly consistent with the theory, because 
we can add a tail to it and get an infinite sequence that belongs to the set T. 

Let us describe this idea in formal terms. 
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Definition 4. 

• Let a definable set R be given. Its elements will be called possible results 
of experiments. By S, we will denote the set of all possible sequences 
r\, . . . , r n , . . ., each element r*j of which is a result of an experiment (i.e., 

n e R). 

• By a theory, we mean a definable subset T of the set of all infinite sequences 
S. If r E T , we say that an infinite sequence r satisfies the theory T, or, 
that for this sequence r, the theory T is correct. 

• We say that a finite sequence (7*1, . . . , r n ) is consistent with the theory T 
if there exists an infinite sequence r E T that starts with r±, . . . .,r n and 
that satisfies the theory. In this case, we will also say that the first n 
experiments confirm the theory. 

• We say that a theory T is physically meaningful if the following is true: 

Let r be a sequence r E S such that for every n, the results of first 
n experiments from r conform the theory T . Then, the theory T is 
correct for r. 

In this case, the universal set consists of all possible infinite sequence of 
experimental results, i.e., U = S. 

PROPOSITION 2. For every physically meaningful theory T , there exists an 
integer N such that normally, if the first N experiment confirm the theory T , 
then this theory T is correct. 

In other words, if the sequence of results r is not abnormal, and the results 
of first N experiments are consistent with the theory, then the theory is cor- 
rect. This result shows that we can confirm the theory based on finitely many 
observations. 

Philosophical comment: physical induction and its paradoxes. The derivation of 
a general theory from finitely many experiments is called physical induction (as 
opposed to mathematical induction). These have been many formalizations of 
different ideas that physicists use, and these formalizations has lead to success- 
ful programs that can find a general dependency from the cases (see, e.g., 
However, in spite of the success in describing several underlying ideas, the gen- 
eral physical induction is difficult to formalize, to the extent that a prominent 
philosopher CD. Broad has called the unsolved problems concerning induction 
a scandal of philosophy Jll[ . We can say that the notion of "not abnormal" 
justifies physical induction (and thus resolves the corresponding scandal). 

Philosophical comment: Ockham 's Razor justified. Ockham Razor is a principle 
according to which, one should not unnecessarily multiply the number of entities. 
It is usually understood as follows: if we have two properties A and B, and if 
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in all experiments, these two properties coincide, then we should assume that 
these two properties are identical. The above result justifies Ockham's razor 
principle: namely, if, as a theory T, we consider a statement that A and B are 
always identical, then from Proposition 2, we can conclude that normally, if A 
and B are identical for first N experimental results, then these two properties 
do always coincide. 

4.1.1 Abnormal Theories and Related Paradoxes 

The necessary number of experiments differ from a theory to a theory. In 
principle, we can formulate a theory that predicts the same results as our normal 
physics until some arbitrarily chosen year Y, and then predicts something else. 
For this theory, N must be so large as to stretch to that year Y , So, the larger 
Y, the larger N. Such theories are artificial and abnormal. It turns out that 
if we restrict ourselves by not abnormal theories, then we will have a universal 
bound on N: 

Namely, let us assume that on the set U of all pairs (T, r) , where T is a 
physically meaningful theory, and r is a sequence of experimental results, we 
have selected a subset of typical (not abnormal) pairs T. Then, the following 
result is true: 

PROPOSITION 3. There exists an integer N such that normally, if a phys- 
ically meaningful theory T is confirmed by the first N experiments, then this 
theory is correct. 

In other words, if a pair (T, r) is not abnormal, and first N experiments 
from r confirm the theory T, then T is correct on r. 

Philosophical comment: Goodman's paradox explained. Formalization of physi- 
cal induction is a difficult task, known to lead to paradoxes. For example, Nelson 
Goodman J3l], ^2| has proposed the following paradox. We have observed emer- 
alds many times, we know that they are green, so we conclude that emeralds 
are always green. Instead of the theory "emeralds are green", let us consider 
an alternative theory "emeralds are grue" , where grue stands for "green before 
the year 2010, and blue after the year 2010". Then, all the evidence that we 
have used to conclude that emeralds are green, also confirms that they are grue. 
However, to conclude that emeralds are grue is strange. 

There have been several idea on how to solve this paradox (see, e.g., [jl], ^8|, 
p2| p2| p4| ^, |l4|). However, as it is remarked in Chapter 14 of pl[ , these ideas 
has not yet lead to a consistent formalization, and hence, Goodman himself did 
not consider this paradox solved. From the physicist's viewpoint (that Goodman 
himself explained in his papers and that other researchers tried to formalize), 
Goodman's paradox is not a paradox at all: green is a natural property, while 
grue is an abnormal one. The problem is to formalize this distinction. The 
above formalization provides exactly this answer to the paradox: Indeed, if a 
not abnormal theory T is confirmed by N experiments and is, therefore, correct, 
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and if another theory T is confirmed by these same experiments, but leads to 
different predictions, then due to Proposition 3, it means that T is an abnormal 
theory. 

Of course, this is not a complete solution of the related set of problems, be- 
cause we still need to find out how to distinguish between normal and abnormal 
theories. 



4.2 How To Guarantee The Exact Values of Parameters 

In the following text, by ||x||, we will mean a Euclidean norm of the vector x. 

PROPOSITION 4. Let d be an integer, and let a be a definable point from 
R d . Then, there exists an e > such that if x is not abnormal, and \\x — a\\ < e, 
then x = a. 

Comment. This result is actually correct for an arbitrary definable metric space 
X with a metric d. 

This means that for every set of typical elements T C R d , there exists an 
e > such that if x is not abnormal (i.e., x € T), and x is e— close to a, then 
x = a. 

In other words, if we make an assumption (that physicists usually make) that 
the actual values of parameters are not abnormal, then it is not true that we can 
test a theory with better and better accuracy and never be 100% guaranteed 
that the theory is correct: there exists an s > such that if we have confirmed 
the theory with the accuracy e, then this theory is true. 

From this result, we can conclude that coincidences are not accidental: In- 
deed, suppose that we know a constant c. Now, if in some other (not clearly 
related) physical experiments we get a constant that is very close to c, then we 
can conclude that this is the same constant. This type of argument is very fre- 
quent in physics: e.g., the discovery that light consists of electromagnetic waves 
was prompted by the fact that the computed velocity of these waves turned out 
to be very close to the measured speed of light. 

Comment. Arguments of the type "This is too improbable to be a mere coinci- 
dence. There must be some reason. " are also used in mathematics, to deduce 



hypotheses from the observed facts (see, e.g., 76 , Chapter XIV, Section 16) 



Restriction To "Not Abnormal" Solutions 
Leads To Regularization Of Ill-Posed Prob- 
lems 



The material described in this section follows 145 
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5.1 The Main Result 



An ill-posed problem arises when we want to reconstruct the state s from the 
measurement results r. Usually, all physical dependencies are continuous, so, 
small changes of the state s result in small changes in r. In other words, a 
mapping / : S — ► R from the set of all states to the set of all observations is 
continuous (in some natural topology). We consider the case when the mea- 
surement results are (in principle) sufficient to reconstruct s, i.e., the case when 
the mapping / is 1-1. That the problem is ill-posed means that small changes 
in r can lead to huge changes in s, i.e., that the inverse mapping J" 1 : R — ► S 
is not continuous. 

We will show that if we restrict ourselves to states S that are not abnormal, 
then the restriction of J -1 will be continuous, and the problem will become 
well-posed. 

Definition 5. A definable metric space (X,d) is called definably separable if 
there exists a definable everywhere dense sequence x n £ X. 
PROPOSITION 5. Let S be a definably separable definable metric space, T 
be a set of all not abnormal elements of S , and f : S — > R be a continuous 
1-1 function. Then, the inverse mapping f^ 1 : R — > S is continuous for every 
ref(T). 

In other words, if we know that we have observed a not abnormal state (i.e., 
that r = f{s) for some s £ T), then the reconstruction problem becomes well- 
posed. So, if the observations are accurate enough, we get as small guaranteed 
intervals for the reconstructed state s as we want. 

Mathematical comment. This Proposition uses the following Lemma that may 
be of independent interest: 

LEMMA 1. If X is a definably separable definable metric space, and T is a 
set of all not abnormal elements of X , then the closure T is a compact set. 



5.2 How Can We Actually Use This Result to Get Guar- 
anteed Estimates [ |42| 

To actually use this result, we need an expert who will tell us what is abnormal. 
We will show that if we use such an expert, then for every computable function 
/ : X — > Y, if we know that x £ T, then sufficiently accurate knowledge of fix] 
will enable us to reconstruct x with any given accuracy. 
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Definition 6. 



• By an expert, we mean a mapping E : {A n } — > Z that transforms a 
definable decreasing sequence with an empty intersection into an integer N 
for which Am P\T — (i.e., for which all elements from Am are abnormal). 

• We say that an output is computable with an expert if it is computable 
on a computer that can consult an expert (i.e., that sends an expert a 
formula defining {A n } and gets N). 

The following definitions are standard in constructive analysis [|[ |[ [| [70) . 
Definition 7. 

• We say that an algorithm IA computes a real number x if for every natural 
number k, it generates a rational number r^ such that \r^ — x\ < 2~ k . We 
say that we have a computable real number if we have an algorithm IA that 
computes it. 

• By a computable separable metric space, we understand a separable metric 
space (X,d) with an everywhere dense sequence {x n } for which there 
exists an algorithm, that transforms a pair of positive integers n, m into a 
computable real number d(x n , x m ). 

• By a computable element of a computable space we understand a pair 
consisting of an element x £ X and an algorithm that given n, returns an 
integer m(n) for which x m i n ) is a 2~ n — approximation to x. 

• Let X and Y be computable separable metric spaces. We say that an 
algorithm V computes a function f : X — > Y if V includes calls to an 
(unspecified) algorithm U so that when we take as IA an algorithm that 
computes an element x G X, then V will compute an element f(x) G Y. 

• We say that a computable function f is constructively continuous on a set 
S if there exists an algorithm, that for every e > 0, generates 5 > such 
that if \x-y\< S, then \ f(x) - f(y)\ < e. 
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Definition 8. Assume that we are given the following information: 

• computable separable metric spaces X and Y; 

• a computably continuous computable 1-1 function f : X — > Y (1-1 means 
that ifx / x', then f(x) ^ f(x') ); 

• an algorithm that, given an integer k, returns a 2~ k — approximation to 
f(x), where x £ T is an (unknown) typical element; 

• a positive integer I. 

We say that algorithm solves the inverse problem if, given the above information, 
this algorithm returns a 2~ l — approximation to y. If such an algorithm exists, 
then we will say that an inverse problem is computable. 

PROPOSITION 6. The inverse problem is computable with an expert. 

Comment. 

• This general algorithm can be applied to different numerical problems: to 
solving a system of non-linear equations (when X and Y are R k for some 
k), to solving integral equations (when X and Y are sets of functions), 
etc. 

• If we do not restrict ourselves to not abnormal elements x, then in many 
cases, it will be impossible to have an algorithm for solving the inverse 
problem: indeed, if such an algorithm is possible, then the inverse function 
J" 1 is continuous, but, as we have already mentioned, for some continuous 
1-1 mappings / from a non-compact set, the inverse is not continuous j84j. 

• The algorithm described in the proof is general and therefore (as many 
general algorithms), when applied to simple problems, it may require un- 
necessarily many computation steps. There are cases when simpler meth- 
ods are possible: e.g., if the signal that we are trying to reconstruct is 
a smooth function, then we can ask an expert what is the upper bound 
for the signal's energy J (x' (t)) 2 dt, and then use known regularization 
techniques Q. 

6 If We Impose The Condition That The Ac- 
tual State Is Not Abnormal, Then We Can 
Get Guaranteed Estimates Even For Heuris- 
tic Numerical Methods 
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6.1 How To Check Whether A Numerical Method Always 
Works 

To check the numerical method, we can run it on several tests. Usually, the 
first tests are are simple computer-generated ones. If a method behaves nicely 
on these simple tests, then it is tried on realistic or real-life examples, where 
the input data come from real experiments. So, we have a (potentially infinite) 
sequence of tests. Let us assume that testing is performed in such a way that on 
some stage, every part and every aspect of the method is tested. Mathematically, 
let us assume that the potentially infinite sequence of test cases has the following 
completeness property: if a method works correctly for all (infinitely many) test 
cases, then this method is always correct. 

Definition 9. 

• Let a definable set t be given. Its elements will be called tests. By a testing 
method S, we mean a set of infinite sequences ti, . . . , t n , . . . of tests. 

• By a numerical method, we mean a definable subset M of the set of all 
tests t. If ti G M, we say that a method M passed the test ti; else, that 
the method M failed the test M. 

• We say that a method is correct if it passes all tests from t. 

• Let a class of methods Ai be fixed. We say that a testing method S is com- 
plete for methods from the class M. if for every sequence (ti, . . . , t n , . . .) G 
S, and for every method M G M, if the method M passes all the tests 
ti, . . . , t n , . . ., then this method is correct. 

PROPOSITION 7. Let a testing method S be complete for a class of numer- 
ical methods M.. Then, for every method M G M., there exists an integer N 
such that if a sequence (ii, . . . ,i„, . . .) G S is not abnormal, and M passes the 
first N tests of this sequence, then this method M is correct. 

This Proposition justifies the usual testing of a method, in which we make 
a conclusion about its correctness after only finitely many tests. The crucial 
assumption here is that we assume that the testing sequence is taken from the 
real-life examples, and these examples are not abnormal. 

6.2 How To Get A Guaranteed Estimate For The Result 

In many practical cases, we know the process Xk that is proven to converge to 
the desired solution x, but we do not know when to stop in order to guarantee 
the given accuracy e (i.e., to guarantee that d(x,Xk) < e). 

For example, we may use an iterative method Xk+i = F(xk) to solve the 
equation F(x) = x. 
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In these cases, heuristic methods are used. There are two main groups of heuris- 
tic methods: 

• Usually, in iterative methods, if x k = xtc+i, then x n is the required solu- 
tion. Therefore, if Xk and Xk+i are close, we can conclude that we are close 
to the solution. Hence, we stop when the consequent values Xk become 
close enough, i.e., when d(xk,Xk+i) < 6 for some S > 0. This method 
is often used in physics, if, e.g., we have the expression of i as a sum 
of the infinite series (e.g., Taylor series in perturbation methods). Then, 
if, e.g., second order terms are negligibly small, we neglect quadratic and 
higher order terms, and use the linear expression as an approximation to 
the desired solution (see, e.g., |p5|). 

• If we are solving the equation f(x) = y, then we stop when f(xk) becomes 
small enough (i.e., when d(f(x), y) < S for some 5 > 0). 

These stopping criteria can be described by the following general definition: 

Definition 10. Let X be a definable metric space, and let S be a definable set 
of convergent sequences of X . 

• Let {xk} G S, k be an integer, and e > a real number. We say that Xk 
is e— accurate if d(xk, lunx p ) < e. 

• Let d > 1 be an integer. By a stopping criterion, we mean a function 
c : X d — > Rq = {x G R | x > 0} that satisfies the following two properties: 

• If {x k } G S, then c(x k , . . .,x k +d-i) -> 0. 

• If for some {xk} G S and for some n, c(xk, ■ ■ ■ ,Xk+d-i) = 0, then 
x k = ... = x k+d -i = limxp. 

The two above-described criteria correspond to c(x,x') = d(x,x') and c(x) = 
d(f(x),y). 

PROPOSITION 8. Let c be a stopping criterion. Then, for every e, 
there exists a 5 > such that if a sequence {xk} is not abnormal, and 
c(xk, ■ ■ ■ ,Xk+d-i) < 5, then Xk is e— accurate. 

So, if we restrict ourselves to not abnormal sequences only (i.e., sequence that 
stem from not abnormal, physical observations), then c(x fc,..., Xk+d-i) < 
8 guarantees that we are e— close to the desired solution. In particular, 
d(xk, Xk+i) < 5 and d(f(xk), y) < 5 guarantee that d(x n , x) < e. In case we are 
summing a numerical series Xk = ai + ■ ■ ■ + ctfc, we have d(xk, %k+i) = |«fe+i|, 
so, this stopping criterion means that means if the next term is negligible 
(|ofc+i| < <^)j then we are e— close to the sum: \xk — x\ < e. 
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6.3 When Will The Algorithm Stop? 

In practice, it is not sufficient to claim that an algorithm generates a guaranteed 
estimate. We would like to know when to expect the result. A computer can 
go wrong, so if the computations take too long, we would like to know whether 
it is just taking long, or there has been a computer error, and we better start 
anew. 

Theoretically, arbitrarily long computations are possible. However, as we 
will see, computations do not take too long. 

Definition 11. 

• Let a set U be given. Its elements are called computations. 

• Let a function t : U — ► R U {00} be given. The value t(u) will be called 
the run time of the computation u. 

• We say that computation terminates if t(u) 00. 

PROPOSITION 9. There exists a number To > such that if a computation 
u is not abnormal, and it terminates, then its run time is < To. 

6.4 If a Numerical Method is Polynomial-Time and Not 
Abnormal, Then It Is Truly Feasible 

It is well known that not all algorithms are realistic (see, e.g., |5!|, Section 7.1). 
If an algorithm requires, say, 2 2 ( ' computational steps for an input x of length 
len(a;), then for realistic lengths (e.g., for len(x) = 100) this number of steps 
will exceed the lifetime of the Universe (according to modern cosmology). So 
if we are interested in separating purely theoretical algorithms from the ones 
that can be actually run on the computers (existing or future ones), we must 
somehow formalize the notion of feasibility. 

The most widely used formalization of this notion is that feasible algorithms 
are exactly the ones that are time-polynomial, i.e., the ones for which the run- 
ning time is limited by some polynomial P(len(x)) of the input length len(a;) 
(see, e.g., Section 7.4; Ch. 23). There exist formal systems of reason- 
able axioms that justify this choice (see, e.g., [79|]). 

However, the majority of the researchers agree that this is not the precise 
description of a feasible algorithm, because some time-polynomial algorithms 
are evidently not feasible. For example, an algorithm that takes 10 10 len(ir) 
time to compute is time-polynomial (even linear-time) but it can hardly be 
called feasible: even for len(x) = 1, it requires the computation time that is 
exponentially bigger than the lifetime of the Universe. 

There are two possible approaches to this situation: 
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We can view this situation as a problem, and try to come up with a new 
definition of feasibility that will really describe only physically feasible 
algorithms. Such a formalization is proposed, e.g., in 71 . 



• In this section, we will pursue another approach: we will show that nor- 
mally, the values of the coefficient and the exponent cannot grow indef- 
initely: namely, there exist C and K such that if we exclude abnormal 
methods, then running time tu(x) for all not abnormal time-polynomial 
algorithms U is limited by C ■ (len(x)) . 

PROPOSITION 10. There exists C > and K > such that if a polynomial- 
time algorithm U is not abnormal, then its running time tu{n) is bounded by 
C-n K . 

In other words, if we take the set of all polynomial-time algorithms as U , and 
denote by T the set of all typical elements of U, then there exists K and C such 
that the running time of every algorithm from T is bounded by C ■ n K . 

This proposition can be confirmed by the fact that for every problem, for 
which a time-polynomial algorithm has been known for some time (for a few 
years), in a few years, a new algorithm is discovered for which the running time 
is limited by a cube of n, i.e., by C ■ n 3 (see, e.g., the time during which 
this happens is jokingly called "incubation period"). 

7 The Notion Of "Not Abnormal" Is Also Help- 
ful For Foundations of Physics 

In the previous sections, we described how the notion of "not abnormal" can lead 
us to guaranteed interval (and, in general, error) estimates. In other words, we 
showed that this notion is helpful in computational physics. In this section, we 
will show that this notion can also help in the problems related to foundations 
of physics. These results will show that our formalization is in good accordance 
with the modern theoretical physics. 

Some of these results have been presented in ffl, [27j . 

7.1 Every Physical Quantity is Bounded 

PROPOSITION 11. If U is a definable set, and f : U -> R is a definable 
function, then there exists a number C such that if u £ U is not abnormal, then 

l/HI < c. 

If we use the physicists' idea that abnormal initial conditions and/or abnormal 
values of parameters are impossible, then we can make the following conclusions: 
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Special relativity. If as U, we take the set of all the particles, and as /, we 
take velocity, then we can conclude that the velocities of all (not abnormal) 
particles is bounded by some constant C . This is exactly what special relativity 
says, with the speed of light as C. 

Cosmology. If we take the same state U, and as /, take the distance from the 
a particle u to some fixed point in the Universe, then we can conclude that the 
distances between particles in the Universe are bounded by a constant C . So, 
the Universe is finite. Similarly, if we take a time interval between the events 
as /, we can conclude that the Universe has a finite lifetime. 

Why particles with large masses do not exist. If we take mass of the 
particle as /, then we can conclude that the masses of all particles are bounded 
by some constant C . This result explains the following problem: 

• Several existing particle classification schemes allow particles with arbi- 
trarily large masses |53[ |l(| |6jJ. E.g., in Regge trajectory scheme, particles 
form families with masses m n = mo + n ■ d for some constants and d. 
When n — > oo, we have m„ — > oo. 

• Only particles with relatively small masses have been experimentally ob- 
served (see, e.g., J73[). 

These particles with large masses, that are difficult to wed out using equations 
only, can be easily weeded out if use the notion of "not abnormal" . 

Dimensionless constants are usually small. This is the reason why physi- 
cists can safely estimate and neglect, e.g., quadratic (or, in general, higher order 
terms) in asymptotic expansions, even thou gh n o accurate estimates on the co- 
efficients on these terms is known [[32J Q |56|, |2^, |57|. In particular, 
such methods are used in quantum field theory, where we add up several first 
Feynman diagrams J25|, ^, |69) ; in celestial mechanics jl3[ [l2| , etc. 

Comment: Consequences for philosophy of mathematics. Physically meaningful 
numbers are bounded. Hence, it seems reasonable to place only physically 
meaningful integers in the foundations of mathematics. In the corresponding 
formalisms, there will be finitely only finitely many integers. The ideas of such 
formalisms were originally developed by Van Dantzig, Essenine-Volpine, and 
Kolmogorov |41| , and have been later transformed into a useful formalism by 
Parikh (see [f72f and references therein). 
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7.2 Quantization 

PROPOSITION 12. Let U be a definable set, and f : U -> R be a definable 
function. Then, there exists a number e > such that if u is not abnormal and 
f(u)?0, then |/(u)| >e. 

Together with the physicists' idea that abnormal situations are impossible, we 
can conclude that a physical quantity cannot have arbitrarily small positive 
values: there must be the smallest value that is indivisible. This explains, e.g., 
why the electric charge cannot tale any value we want: there is the charge 
quantum (1/3 of an electron's charge). 

Infinities in quantum field theory disappear. This result can be justified 
by the joint use of quantization and boundedness: indeed, in quantum field 
theory, infinities are caused by the fact that we have have to integrate over all 
momenta p of all particles. Infinities happen because we have to integrate over 
p — » oo and p — > (see, e.g., [^5|). If we apply boundedness and quantization 
results to momentum, we conclude that p is bounded from above and from 
below. Therefore, all the integrals should be finite (another interval-related 
argument that makes infinities disappear is given in fl39f ). 

Schroedinger's cat paradox stops being paradoxical. According to tra- 
ditional quantum mechanics, states are described by a vectors from a Hilbert 
space L 2 , and all vectors have a physical meaning. E. Schroedinger has shown 
that this assumption leads to the following paradox (for description and dis- 
cussions, see, e.g., @): Suppose that we place a cat into a box with a gun 
aimed at it. A gun is controlled by the switch, which can be triggered by a 
left-polarized photon. If we send a photon in a left-polarized state si, the gun 
fires, and the cat is dead. If we send a photon in a right-polarized state S2, the 
gun does not fire, and the cat is alive. Suppose now that we send a photon in a 
state s that is a superposition of si and S2 (i.e., in mathematical terms, a linear 
combination). Equations of quantum mechanics are linear, so, as a result, we 
get the state that is a superposition of dead and alive. Such a superposition is 
difficult to imagine, because in real life, an animal is either dead, or alive. 

This paradox is based on the assumption that all vectors from a Hilbert 
space are physically meaningful states. If we impose the additional condition 
that only not abnormal states are physically possible, then we can exclude some 
states as being abnormal. Indeed, from the quantization result, it follows that 
there exists an e such that if a physically meaningful state is e— close to the 
"dead" state, then it is the "dead" state. Paradoxical continuous transition 
between dead and alive thus disappears. 

7.3 Chaos 

The origin of chaos. Restriction to not abnormal also explains the origin of 
chaotic behavior of physical systems. In mathematical terms, chaos means that 
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after some time, the states of the system form a so-called strange attractor, i.e., 
in topological terms, a completely disconnected set in the following precise sense: 

Definition 12. A set S in a metric space X is called completely disconnected if 
for every si, S2 G S, there exist open sets Si and S2 such that si G Si, S2 G S2, 
S*inS* 2 =0, and5C5iUS* 2 . 

In other words, every two points belong to different topological components of 
the set S. The relationship between this definition and typical elements is given 
by the following result: 

PROPOSITION 13. In a definable separable metric space, the set of typical 
elements is completely disconnected. 

So, if we assume (as physicists do) that abnormal states are impossible, then 
we immediate arrive at the chaotic dynamics. 

Spontaneous symmetry violations. Equations of physics have lots of sym- 
metries. If an equation is, e.g., invariant w.r.t. rotations, and the initial con- 
dition is rotation- invariant, then the solution stays rotation- invariant for all 
moments of time. From the mathematical viewpoint, symmetric solutions are 
quite possible. However, in real life, we only observe approximately symmetric 
solutions. E.g., in cosmology, from the observations of the 3K radiation, we can 
conclude that the initial state of the Universe was highly isotropic and homoge- 
neous (see, e.g., p^). However, the observed Universe is not. This means that 
the initial conditions were only approximately isotropic and homogeneous. 

In each particular case, we may have specific physical reasons for symmetry 
violation. Restriction to "not abnormal" leads to a general explanation p7j : 
namely, with this restriction, the theory consists not only of the set if equations, 
but also of the set T of physically possible (not abnormal) initial conditions. We 
are going to show that even if the equations are invariant, the set T is not. 

Definition 13. Let X be a topological space. 

• By a continuous transformation group, we mean a connected continuous 
group G with a continuous mapping a : G x X — > X such that a gig2 (x) — 

• A set S is called invariant w.r.t. G if a g (S) = S for all g G G (where 
a g (S) = {a g (s)\seS}). 

• We say that the continuous transformation group G is non-trivial if 
a g (x) 7^ x for some g G G and for some not abnormal x G X. 

PROPOSITION 14. Let X be a definable separable metric space, T be the set 
of typical elements of X , and let G be a non-trivial continuous transformation 
group. Then, the set T is not invariant w.r.t. G. 
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7.4 All The Processes In The World Are Connected 



Let us assume that we are studying two physically unrelated processes. Let 
X\ denote the set of all possible states (initial conditions) of the first process, 
and X2 denote the set of all initial conditions of the second process. Then, 
for each process, we can formalize the physicists' idea that initial conditions 
cannot be abnormal by saying that X\ £ T\ and x-i £ T 2 , where T$ C are the 
corresponding sets of typical elements. 

We could also analyze the two processes as a whole. The state of a pair of 
processes can be characterized by a pair of states (xi,x 2 ). Since the processes 
are unrelated, the state x\ of the first process cannot influence the state of the 
second one. So, for every x\, the set of all possible states x 2 of the second 
process is X 2 . Therefore, the set of all possible pairs (x\, X2) is equal to the set 
X = X\ x X 2 of all the pairs [x\, x 2 ), Xi £ Xj. We can now formulate our "not 
abnormal" idea by saying that (xi,x 2 ) € T, where T is the set of all typical 
pairs. 

Since we assumed that the processes are physically unrelated, it seems like 
the choice of the state of the first process should not change whatever states 
are possible for the second one. Therefore, we would expect that the set of 
physically possible (not abnormal) pairs coincides with the Cartesian product 
T = Tx xT 2 . 

In principle, it is possible to find such a T: e.g., we can take a one-point set 
T. However, as we will show, if we take into consideration that typical states 
must form a majority in some reasonable sense, then such T = T\ x T 2 is no 
longer possible. In other words, every two processes in the world are related, 
even if the equations that describe these processes are independent. 

The strongest result occurs if we consider two identical processes, for which 
it is natural to assume that if (x\,x 2 ) £ T, then (x 2 ,x\) £ T: 
Definition 14. 

• Let X be an arbitrary set. By a permutation, we mean a mapping X x X — > 
X x X defined as (xi,x 2 ) — > (x 2 ,a; 1 ). 

• For arbitrary sets X\ x X2, we say that a set T C X\ x X2 is factorizable 
if T = T 1 xT 2 for some T,CI t . 

• Let p be a probability measure on X that is non-atomic (i.e., p({a}) = 
for all x £ X). We say that a set Ti C X is a majority set if p(Ti) > 1/2. 
We say that a set T C X\ x X 2 is a majority set if (p x p)(T) > 1/2. 

In physical terms, factorizable sets correspond to truly independent processes: 
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Definition 15. 



• Let X\ and X 2 be sets. Elements of X\ will be called states of the first 
process. Elements of X 2 will be called states of the second process. A pair 
(xi,x 2 ) G Xi x X 2 is called a joint state. 

• Let T C Xi x X 2 be a set of joint states. States from T will be called 
physically possible. 

• We say that a state x\ G X\ of the first process is physically possible if 
(xi, x 2 ) E T for some x 2 G X 2 . 

• We say that a state x 2 G X 2 of the second process is physically possible if 
(xi,x 2 ) G T for some Xi G X\. 

• Let a state x\ G X\ be given. We say that given a state x\, the state x 2 is 
possible for the second process if (xi, x 2 ) G T. The set of all states of the 
second process that are possible for a given x\ will be denoted by P 2 (x\). 

• Similarly, we can define P\(x 2 ). 

• We say that the processes are truly independent if the set of possible 
states of the seocnd process does not depend on the state of the first 
process (and vice versa), i.e., if x\ and x[ are both physically possible, 
then P 2 ( Xl ) = P 2 (x[). 

The following result is easy to prove: 

PROPOSITION 15. Two processes are truly independent iff the set T is 

factorizable. 

PROPOSITION 16. Let X be a definable separable metric space, and letT be 
an infinite set of typical elements of X x X that is invariant w.r.t. permutation. 
Then, T is not factorizable (i.e., two processes are not truly independent). 

The proof of this Proposition is based on the following Lemma that may be of 
independent interest: 

LEMMA 2. If X and Y are definable sets, f : X — > Y is a definable mapping, 
and T is a set of typical elements of X , then f{X) is a set of typical elements 
ofY. 

In other words, this Lemma says that if an element x G X is not abnormal, then 
its image is also not abnormal. 

In the general case, we can usually assume that X\ = X 2 (e.g., in quantum 
mechanics, the set of all possible states of any system is a Hilbert space L 2 ). 

PROPOSITION 17. Let X be a definable separable metric space, and let T 
be a majority set of typical elements of X\ x X 2 . Then, T is not factorizable 
(i.e., two processes are not truly independent). 
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PROPOSITION 18. Let X be a definable separable metric space, and let T\ 
and T2 be majority sets of typical elements of X. Then, T = T\ x T2 is not a 
set of typical elements of X x X (i.e., the processes are not truly independent). 

The proof of these statements is based on the following Lemma: 

LEMMA 3. IfT is a set of typical elements of X, then every non-empty subset 
of T is also a set of typical elements of X . 

If we have s > 2 systems, then we can prove an even stronger statement: 

PROPOSITION 19. Let X be a definable separable metric space, and let 
T\, . . . , T s be sets of typical elements of X for which p(Ti) > 1/s for all i. Then, 
T = Ti x . . . x T s is not a set of typical elements of X x . . . x X . 

We can generalize the above definition of true independence to s > 2 processes, 
and claim that under the conditions of Proposition 19, these s processes are not 
truly independent. 

This proposition take into consideration the fact that "typical" does not 
necessarily mean "belonging to the majority": e.g., a "typical professor" may 
combine several features, each of which may be typical for a majority, but when 
combined, may be rather rare. 

Let us describe how these results are related to theoretical physics. 

EPR paradox. Analyzing quantum mechanics, Einstein, Podolsky and Rosen 
came up with the conclusion that in quantum mechanics, it is potentially pos- 
sible to have correlation between the states of spatially separated particles at 
the same moment of time. This conclusion clearly contradicts special relativity, 
according to which immediate commununication between spatially spearated 
events is impossible. Because of this contadiction, this conclusion was called a 
paradox (named EPR by first letters of their names). For a detailed descrip- 
tion and references, see, e.g., J74|. The above results show that if we take into 
consideration the fact that a theory is not only equations, but also initial con- 
ditions, then connection even between spatially separated events is possible, so 
EPR paradox is not a paradox anymore (other solutions to this paradox are 
presented in Q). 

Interaction between parallel worlds. Modern physics is formulated in terms 
of probabilities. Because of that, even if we measure everuthing accurately, we 
cannot uniquely predict the results of future experiments. One way to describe 
it is to say that instead of a single world history, there are several possible 
world histories. Usually, only one history is consider real, all others are viewed 
as purely mathematical objects. However, strating from Wheeler and Everett, 
some researchers started to consider the possibility that all world histories are 
real: one of them describes our world, in which we live. Others describe other 
worlds (these other worlds do not intersect with ours are are therefore called 
parallel worlds ]74[|). 
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If we assume that parallel worlds do not influence our world, then whether 
we call them real or not is a question of semantics: no experiments in this world 
arc influenced by anything that happens in these parallel worlds. However, 
some theorists suggested that a small interaction is possible. For example, 
in 1972, A. Sakharov have suggested that the space-times of the worlds do 
have intersections; these intersections are elemenntary particles, and observable 
properties of the particles can be interpreted as topological characteristics of 
the intersection (for details and references, see, e.g., 0). From the described 
viewpoint, this is a quite natural idea: as soon as we adopted the model with 
parallel worlds, then automatically we adopted the postulate that what's going 
on in all these worlds is typical with respect to this theory. Hence, due to the 
above propositions, what is happening in one of the worlds can influence the 
others. 

8 Proofs 

Proof of Proposition 1. We have already mentioned that there are denumer- 
ably many definable sequences of sets. Therefore, there are no more than denu- 
merably many sequences of sets A n for which A n D A n+ \ and (~]A n = 0. So, we 
can enumerate such sequences. Let us denote the elements of the fc— th sequence 
by Aq, A k , . . . , A„, ... For every k and for every i, from C\A^ = and monotonic- 
ity, it follows that pi(A k l ) — > 0. This means that there exists an integer N k such 
that if n > N z , then p l {A k n ) < e-2~ k . Let us define = max^f, . . . , N^ ] ). 
Then, > N^ k] for all i, and therefore, p t {A k N{k) ) < e ■ 2~ k . As T, we will 

take the complement to the set 

oo 

A = I^J A k N{k) . 
fc=i 

It is easy to see that A is a set of typical elements: indeed, for every sequence 
AQ,A k ,...,A k ,..., we have T n A^. ih) = 0. Now, let us show that elements 
from A are rare. Indeed, for every i, we have Pi(A) < ~Yl,Pi{A k N(k) ). For every i, 
we have Pi(A k N(k) ) < e ■ 2~ k , and therefore, Pl {A) < £(s ' 2 ~ k ) = £• Q-E.D. 
Proof of Proposition 2. As A n , let us take the set of all sequence r S S for 
which first n experiments confirm the theory T, but some further experiments 
do not confirm T. Then, it is easy to show that A n D A n+ i for all n. Since the 
theory is physically meaningful, we have <lA n = 0. Therefore, there exists N 
for which An n T — 0, i.e., for which all not abnormal sequences belong to the 
complement of An. Due to our definition of An, r $ An means that if first N 
experiments confirm the theory, then this theory is correct. Q.E.D. 
Proof of Proposition 3. As A n , take the set of all pairs (T, V) for which 
first n experiments rom the sequence r confirm the theory, but the theory is not 
correct on r. The proof is similar to the proof of Proposition 2. 
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Proof of Proposition 4. Let us prove this result for an arbitrary definable 
metric space X with a metric d. As A n , let us take the set of all x for which 
< d(x, a) < 2~ n . This sequence is decreasing, and CiA n = 0. Therefore, 
there exists an N for which An H T — 0. This means that none of the elements 
from T belong to A/v- This, in its turn, means that for elements ieT, either 
d(x, a) = 0, or d(x, a) > 2~ n . For e = 2~™, we get the desired result. 

Proof of Lemma 1. A set S in a metric space X is compact iff it is closed 
and for every e, it has a finite s— net, i.e., a finite set S(e) with the property 
that every s G S, there exists an element s(e) G 5(e) that is e— close to s. 

The closure of T is clearly closed, so, to prove that the closure of T is a 
compact, it is sufficient to prove that it has an e— set for all e. For that, it is 
sufficient to prove that for every e > 0, there exists a e— net for T. 

If a set S is a e— net S(e), and e' > e, then, as one can easily see, this same 
set S(e) is also a e'— net for S. Therefore, it is sufficient to show that e— nets 
for T exist for e = 2~ fe , ft = 0, 1, 2, . . . 

Let us fix e = 2~ k . Since X is definably separable, there exists a definable 
sequence X\, . . . , Xi, . . . that is everywhere dense in X. As A ni we will take the 
complement to the union U n of n closed spheres D e (x\), . . . , D e (x n ) of radius e 
with centers in x%, . . . , x n . Clearly, A n D A„ +1 . Since Xi is an everywhere dense 
sequence, for every x, there exists an n for which x G D e {xi) and for which, 
therefore, x G U n and x G - A n = X \ A n . Hence, the intersection of all the sets 
A n is empty. Therefore, there exists an N for which Am flT = 8. This means 
that T C Un- This, in its turn, means that the elements x\, . . . ,xn form an 
e— net for T. 

So, the set T has an e— net for e = 2~ fc , ft = 0, 1, 2, Hence, T is compact. 

Q.E.D. 

Proof of Proposition 5. This proof follows from the known result that if a 
function / is continuous and 1-1 on a compact, then its inverse is also continuous 
(see, e.g., @). In our case, such a function is / : T — > f(T). 

Proof of Proposition 6. Due to Proposition 5, the inverse function / _1 is 
continuous on f(T). In particular, for every I, there exists a 5 > such that 
if x,x' G T and d(f(x),f(x')) < S, then d(x, x') < 2~ l . If we know S, then we 
can compute the desired approximation to x as follows. Since X is definably 
separable, there exists a definable sequence x n that is every weher dense in X. 
Uisng this sequence, we: 

• Compute f(x) with accuracy 6/8; the result of this computation (one of 
the elements of the everywhere dense sequence y m ) will be denoted by 
/»• 

• For n — 1,2,..., compute f(x n ) with accuracy 5/8, and for the result 
f{x n ) of this computation, we compute the distance d(f(x),f(x n )) with 
an accuracy 5/8. When this estimate d is < 5/2, we stop, and produce x n 
as the desired result. 
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Let us show that this algorithm will work. 

• First, let us prove that this algorithm will stop. Indeed, since the se- 
quence x n is everywhere dense in X, we have a subsequence x nk that 
tends to x. Since / is continuous, we have f(x nk ) — ► f(x). So, there 
exists a k for which d(f(x„ k ),f(x)) < (5/8. Since f{x) and f(x nk ) 
are (5/8)— approximations to f(x) and f(x nk ), we can conclude that 
d(f(x nk )J(x)) < d(f(x nk )J(x nk )) + d(f(x nk )J(x)) + d(f(x)J(x)) < 
5/8 + 5/8 + 5/8 = (3/8)5. Hence d < d(f(x n „), f(x)) + 5/8 < 5/2. So, if 
the algorithm did not stop before the value rife, it will stop at this point. 

• Let us now show that the algorithm produces the desired value. Indeed, if 
d < 5/2, then d(f(x n ), f(x)) < d + 5/8 < 5/2 + 5/8, and d(f(x n ), f(x)) < 
d(f(x nk ),f(x)) + d(f(x n ),f(x n )) + d(f(x),f(x)) < 5/2 + 5/8 + 5/8 + 5/8 < 
5. Hence, due to our choice of 5, we have d(x, x n ) < 2~ l . 

So, to complete the description of the algorithm, we must describe how to 
compute 5. 

We must find 5 such that if d(f(x), f(x')) < 5, then d(x,x') < 2~ l . To find 
this 5, let us choose an integer p. Since / is constructively continuous, we can 
compute the value rj such that if d(x, x') < r\, then d(f(x), f(x')) < 2~ p . Let us 
take (3 = min(r/, 2~ p ). For this choice of (3, if d(x,x') < (3, then d(x,x') < 2~ p 
and d(f(x), f(x')) < 2~*. Let us find a (3 -net x^ x^ for X. This can be 
done similarly to the proof of Lemma 1, only instead of referring to existence 
of the desired N , we use the expert to produce such an N . For this (3— net, we 
take all pairs x^\x^ for which d(x^\ x^) > 2~ l — 2/3, and find the smallest 
value M of d(f(x^),f(x^)) for all such pairs. If M > 2(3, then we return 
5 = M — 2/3. Else, we increase p by 1, and repeat the process again and again. 
Let us prove that this part of the algorithm does produce the correct value of 5 
(and thus, that the entire algorithm is correct). Indeed: 

• Let us first show that this algorithm will stop. Indeed, due to Propo- 
sition 5, there exists a value 5 1 > for which if d(f(x),f(x')) < 5', 
then d(x,x') < (1/2) • 2~ l . So, if d(x^\x^) > (1/2) • 2~ l , then 
d(f(x {i) ),f(x^)) > 5'. Hence, if we take p so big that 2~p < 
mm(5'/2,(l/4) ■ 2~ l ), then from d(x (i \x ( ^) > 2~ l - 2f3, and from 
(3 < 2~P < (1/4) • 2-\ we can conclude that d(x^,x^) > (1/2) • 2~ l , 
and therefore, that M > 5' > 2 ■ 2~? > 2/3. 

• Let us now show that if it did stop, then we got the desired 5. Indeed, 
let d(f(x), f(x')) < M — 2/3. Since elements x^> are a (3— net for X, there 
exist elements x^ and x^ that are (3— close to x and x' correspondingly. 
Sue to the choice of (3, we can conclude that d(f(x),f(x^) < (3 and 
d(f(x'),f(x^) < (3. Hence, d(f(x^),f(x^)) < d(f(x), f(x')) + 2(3 < M. 
By definition of M, this means that d(x {l \x^) < 2~ l - 2(3. Therefore, 
d(x,x') < d(x^,x^) + 2(3 < 2- 1 . 
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So, the second part of the algorithm produces correct 5. Q.E.D. 
Proof of Proposition 7. This proof is similar to the proof of Proposition 2. 
Let us fix M. As A n , let us take the set of all testing sequences s s S for which 
the method M passes this first n tests, but fails some other test. Then, it is 
easy to show that A n D A n+ \ for all n. Since the testing method is assumed to 
be complete, we have C\A n = 0. Therefore, there exists N for which An HT = 0, 
i.e., for which all not abnormal testing sequences belong to the complement of 
An. Due to our definition of An, s £ An means that if the method passes first 
N tests, then this method is correct. Q.E.D. 

Proof of Proposition 8. As A n , we will take the set of all sequences for which 
for some k, c(xt, ■ ■ ■ ,Xk+d-i) < 2~™ and d(xk,x) > e. Clearly, A n D A n +i- 

Let us show that the intersection C\A n is empty. Indeed, suppose that the 
sequence {xk} belongs to this intersection. This means that for every n, there 
exists a k(n) such that cfa^^, . . . ,x k ( n - )+d _ 1 ) < 2~ n and d{x^ n ) , x) > e. 
If some value k is equal to k(n) for infinitely many n, this means that 
c(xk, ■ ■ ■ , Xk+d-i) < 2~ n for all n and hence, that c{xk, ■ ■ ■ , Xk+d-i) = 0. 
From the definition of a stopping cruterion, it then follows that Xk = x, 
so d(xk,x) = ^ e. Hence, k{n) — > exo, so (since {x/,} is convergent), 
d(xk( n ) , x )—>0 and d(xk( n ) ,x) ~j> e. The contradiction shows that the in- 
tersection is empty. 

So, there exists an N for which An C\T — %. Hence, we can take 5 — 2~ N . 
Q.E.D. 

Proof of Proposition 9. As A n , we take the set of all computations u for 
which t(u) > n and t(u) ^ oo. Then, there exists N such that An n T = 0. 
Hence, we can take N as the desired To. 

Proof of Proposition 10. As A n , we take the set of all polynomial-time 
algorithms Li for which tu{x) > n ■ (len(x))™ for some input x. Clearly, A n D 
A n+ \. Let us prove that (~]A n = 0. 

Indeed, let us take an arbitrary algorithm U from U and show that it is does 
not belong to the intersection nA n . Every algorithm from Li is time-polynomial, 
i.e., tu(x) < c ■ (lcn(x)) fc for some c and k. Therefore, for n = max(c, k), we 
have U £ A n . Hence, U f)A n , and therefore, <~)A n = 0. 

So, there exists an TV for which An HT = 0. This means that if U G T, then 
tu(x) < n ■ (len(x))™ for all inputs x. So, we can take C = K = N . Q.E.D. 

Proof of Proposition 11. As A n , we take {u | \f(u)\ > n). Then, A n D A n +i, 
PiA n = 0, and hence, there exists an N for which An flT = 0. This means that 
if u e T, then |/(w)| < N. Q.E.D. 

Proof of Proposition 12. Take A n = {u\f(u) ^ 0&|/(u)| < 2""}. Then, 
A n 2 A n+ i, C\A n = 0, and so, there exists an N for which An n T = 0. So, we 
can take e = 2"^. Q.E.D. 

Proof of Proposition 13. Let S\,S2 £ T and s\ ^ s 2 - Then, d(si,s 2 ) > 0. 
Since X is definably separable, there exist a definable everywhere dense sequence 
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x n . In particular, there exists an n for which d(si,x n ) < (1/2) ■ d(si, S2). From 
the trangle inequality, it easily follows that d(s2,x n ) > d(si,S2) — d(si,x n ) > 
(1/2) • d(si, s 2 ). So, d(s2,x n ) > d(si,x n ). On the interval [d(s, x\), d(s2, x n )], 
there exists a rational (and hence definable) point r. 

Let us take A n — {x\ \d(x,x n ) — r\ < 2~ n & d(x, x n ) 7^ 0}. This is a de- 
creasing sequence, and <~)A n = 0, so, there exists an N for which An n T = 0. 
This means that if x G T, then either d(x,x n ) < r — 2~ N , or d(a;,a: n ) = r, or 
d(x, x n ) > r + 2~ N . So, we can take Si = {x \ d(x, x n ) < r — (1/2) • 2~ N }, and 
S 2 = {x\ d(x,x n ) > r - (1/4) • 2~ N }. Both sets are open, 5*10 52 = 0, s; G 5j, 
and T C 5i U 5 2 . So, T is really completely disconnected. Q.E.D. 

Proof of Proposition 14. Since G is non-trivial, there exist an element x G T 
and g G G for which a g {x) ^= x. For this x G T, the orbit Gx = {a g (x) \ g G G} 
is a continuous image of the connected set G and is, therefore, connected. It 
contains more than 2 points. Since T is completley disconneted, T cannot 
contain a connected subset different from a single point. Hence, Gx % T . This 
means that there exists a g for which a g {x) £ T. So, x G T, a g (x) £ T, hence, 
a g(T) ^ T. Q.E.D. 

Proof of Lemma 2. Indeed, let A„ be a sequence of subsets of Y for which 
2 j4«+i, and nA„ = 0. Then, for £?„ = f^ 1 (A n ), we have £?„ D B n+ \. 
If x G CiB n , then /(x) G A n for all n, so, f(x) G nA„, which contradicts 

to our assumption that C\A n = 0. Hence, (lB n = 0. Since T is a set of typical 

elements, we can conclude that there exists an N for which BnCiT = / _1 (Ajv)n 

T = 0. 

Let us show that An H ,/(T) = 0. Indeed, suppose that there exists an 
element y G An H f(T). Since y G f(T), there exists an x G T for which 
y = /(x). This x belongs both to T and to / _1 (Aat) = Bn, which contradicts 
to our choice of N. So, such an element y is impossible. Hence, An D f(T) = 0. 
Q.E.D. 

Proof of Proposition 16. We will prove this Proposition by reduction to a 
contradiction. Assume that T = 7\ x T 2 is a set of typical elements of X x X 
that is invariant w.r.t. permutations. Since T is invariant w.r.t. permutations, 
we have T\ = T 2 . 

If we take A n — {(xi,x 2 ) |d(xi,x 2 ) < 2~"&|,xi ^ x 2 }, then we can con- 
clude that there exists an N for which An f~l T = 0. This means that if 
(xi, x 2 ) G T and xi ^ x 2 , then d(xi,x 2 ) > 2~ n . Since T^T 1 xT 2 and Ti = T 2 , 
we can reformulate this condition as follows: if x\,x 2 G T\, and xi ^ X2, then 
d(xi,X2) > 2~ N . So, every two elements from T\ are > 2 ~ N — different from 
each other. 

The set T t is a projection of T on X: T x = ni(T), where pii : X x X — *■ X 
is a definable mapping (defined as (xi, x 2 ) — > xi). So, due to Lemma 2, Ti is a 
set of typical elements of X. 

Due to Lemma 1, this set T\ is pre-compact. In a compact set, there can be 
at most finitely many elements that are 2 ~ N — different from each other. So, T\ 
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is finite. Hence, T = T\ x T\ is also finite, and this contradicts to our assumption 
that T is infinite. Q.E.D. 

Proof of Lemma 3: this statement immediately follows from Definition 3. 

Proof of Proposition 18. We will prove this proposition by reduction to 
a contradiction. Assume that T\ x T2 is a set of typical elements of X x X. 
Similarly to Proposition 16, we can prove that there exists a e > such that for 
all typical pairs (£1,2:2) £ T, either x\ — x 2 , or d(x\,x 2 ) > e. 

Due to Lemma 3, the intersection Ti PiT 2 C Ti is a set of all typical elements 
of X. Similarly to the proof of Proposition 16, we can now conclude that this 
intersection is finite. Since p is a non-atomic measure, we have p{T\ (~1 T2) = 0. 
Hence, p(Ti UT 2 ) = p^) +p{T 2 ) -p{T 1 nT 2 ) = p{T x ) + p{T 2 ) . But p(Ti) > 1/2, 
so, p(T\ UT 2 ) > 1, which contradicts to the fact that p is a probability measure 
(and so, p(T\ UT 2 ) < p(X) = 1). The contradiction proves that our assumption 
is wrong, and T is not the set of typical elements. Q.E.D. 

Proof of Proposition 17. Assume that T is factorizable, i.e., T = T\ x T 2 . 
Then, due to Lemma 2, each of the sets Ti is a set of typical elements. Since 
Ti x T 2 is (p x p)— measurable, both its projections Tj must be p— measurable 
sets, and (p x p)(Ti x T 2 ) = p(7i) • p(T 2 ). Since p(T 4 ) < 1, we have p(T,) > 
(p x p)(Ti x T 2 ) > 1/2. So, both sets are majority sets. The result follows 
from Proposition 18. Q.E.D. 

Proof of Proposition 19. This proof is similar to the proof of Proposition 17. 
Indeed, suppose that Ti x . . . x T s is a set of typical elements for X x ... x X. 
Then, similarly to that proof, we conclude that the intersection of each pair 
T t and Tj is finite and therefore, p(T t n Tj) = 0. Hence, p{T x U...UTJ = 
p(Ti) + . . .+p(T s ) > (l/s) + . . . + (l/s) = 1, which contradicts to the assumption 
that p is a probability measure. Q.E.D. 
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